﻿Tehnici de Ingineria Limbajului Natural Curs 5 Tehnologiile textului Nivelul semantic: Anafora - fenomene anaforice și rezoluția anaforei Curs: Dan Cristea Laboratoare: Diana Trandabăț, Mihaela Onofrei, Daniela Gîfu, Ionuț Pistol Overview • Anaphoric phenomena • RARE – a framework for general anaphora resolution This course is based on the article: Cristea,D ; Postolache,O D (2005): How to deal with wicked anaphora, in António Branco, Tony McEnery and Ruslan Mitkov (editors): Anaphora Processing: Linguistic, Cognitive and Computational Modelling, Series IV – Current Issues in Linguistic Theory, vol 263, Benjamin Publishing Books, Amsterdam/Philadelphia, pp 17-46 For an online access, follow this link: https://profs info uaic ro/~dcristea/papers/CristeaPostolache %20-%20WickedAnaphora pdf 2 Anaphoric phenomena 3 Anaphora and coreference • Two lexical strings are anaphoric if the mental processing of the second (anaphor) depends on the mental processing of the first (antecedent) I'm right now beginning a talk on AR The audience seems to be still happy • Two lexical strings corefer is they mean the same thing (entity) John met Maria for the first time when he was a student 4 Anaphora eq coreference? • Not all anaphoric links are coreferences: The car stopped John tried to fix the motor • Not all coreferences are anaphoric: The sun is shining today… I begun to read a book about Amenomphis the IVth, the Egyptian pharaoh, son of the sun 5 Why is resolution difficult? • gender mismatch: resolution by semantic features Es Su Majestad suprema … , el se mostro' muy emocionado • number mismatch: the government… the ministers • pronouns – poor semantic features he [+animate, +person, +male, +singular] she [+animate, +person, +female, +singular] it [-animate, +singular] they [+plural] 11 Why is resolution difficult: cataphora “FromthecornerofthedivanofPersian saddle/bagsonwhichhewaslying,smoking, aswashiscustom,innumerablecigarettes, LordHenryWottoncouldjustcatchthe gleam of the honey-sweet and honey- colouredblossomsofalaburnum…” (O Wilde–ThePictureofDorianGray) 14 No difference between anaphora and cataphora in discourse processing • Introduction of an empty/representation- poor discourse entity • Addition of new features as discourse unfolds Pronoun anticipation in Romanian En I taught Gabriel to read = Ro L-am învăţat pe Gabriel să citească Him Ætaught Gabriel to read 20 Component 1: the set of primary attributes a morphological - number - lexical gender - person 25 Component 1: the set of primary attributes b syntactic -full syntactic description of REs as constituents of a syntactic tree [Lappin and Leass, 1994] CT based approaches [Grosz, Joshi and Weinstein, 1995], [Brennan,FriedmanandPollard,1987],syntacticdomainbased approaches [Chomsky, 1981], [Reinhart, 1981], [Gordon and Hendricks,1998],[KennedyandBoguraev,1996] -quality of being adjunct, embedded or complement of a preposition [KennedyandBoguraev,1996] -inclusion or not in an existential construction [KennedyandBoguraev,1996] -syntactic patterns in which the RE is involved syntacticparallelism[KennedyandBoguraev,1996],[Mitkov, 1997] 26 Component 1: the set of primary attributes c semantic and lexical -position of the head of the RE in a conceptual hierarchy, animacy, sex (or natural gender), concreteness WordNetbasedmodels[Poesio,VieiraandTeufel,1997] -inclusion in a synonymy class -semantic roles, out of which selectional restrictions, inferential links, pragmatic limitations, semantic parallelism and object preference can be verified 27 Component 1: the set of primary attributes d positional -offset of the first token of the RE in the text [KennedyandBoguraev,1996] -inclusion in an utterance, sentence or clause, considered as a discourse unit [Hobbs,1987],Azzam,HumphreysandGaizauskas,1998], [Cristeaetal ,2000] 28 Component 1: the set of primary attributes e surface realisation (type) -the domain of this feature contains: zero-pronoun, clitic pronoun, full pronoun, reflexive pronoun, possessive pronoun, demonstrative pronoun, reciprocal pronoun, expletive “it”, bare noun (undetermined NP), indefinite determined NP, definite determined NP, proper noun (name) [GordonandHendricks,1998],[Cristeaet al,2000] 29 Component 1: the set of primary attributes f other - inclusion or not of the RE in a specific lexical field (“domain concept”) [Mitkov,1997] - frequency of the term in the text [Mitkov,1997] - occurrence of the term in a heading [Mitkov,1997] 30 Component 2: a set of knowledge sources •A knowledge source: a (virtual) processor able to fetch values to attributes on the projections layer [Kennedy and Boguraev, 1996]: a marker of syntactic function and a set of patterns to recognises the expletive “it” (near specific sets of verbs or as subject of adjectives with clausal complements) [Azzam, Humphreys and Gaizauskas, 1998]: a syntactic analyser, a semantic analyser, and an elementary events finder [Gordon and Hendrick, 1998]: a surface realisation identifier and a syntactic parser [Hobbs, 1978]: a syntactic analyser, a surface realisation identifier and a set of axioms to determine semantic roles and relations of lexical items • Minimum set: POS-tagger + shallow parser 31 Component 3: a set of matching heuristics or rules • Certifying Rules (applied first): certify without ambiguity a possible candidate • Demolishing Rules (applied afterwards): rule out a possible candidate • Scored Rules: increase/decrease a resolution score associated with a pair 32 Component 3: a set of matching heuristics or rules •[Kennedy and Boguraev, 1996]: a pronoun cannot co-refer a constituent (NP) which contains it (the child of his brother, hisis neither child, nor brother) The remaining candidates are sorted by weighing a set of attribute-values pairs (linguistically and experimentally justified) •[Gordon and Hendricks, 1997]: the antecedent's syntactic prominence (notion related to the relative distance in a syntactic tree) influence the selection of the co- referential candidate •[Gordon and Hendricks, 1998]: the salience of the relations between names and pronouns is calculated by using a graduation of surface realisation pairs: name- pronoun > name-name > pronoun-name 33 Component 4: the domain of referential accessibility Filter and order the candidate DEs: a Linearly Dorepaal, Mitkov, b Hierarchically Grosz&Sidner; Cristea,Ide&Romary 34 DRA: linear search order 35 DRA: hierarchical search order 36 RARE – The engine For each RE of the text level (left to right): • projection phase • proposing/evoking phase • completion phase • re-evaluation phase How can wicked anaphora be accommodated by the framework? Number disagreement • A plural pronoun identifying a conjunction/ disjunction of singular/plural NPs or a split antecedent John waited for Maria They went for a pizza John waited for Maria He invited her for a pizza Example of RARE rules Example of RARE rules Example of RARE rules Example of RARE rules Example of RARE rules RARE in multilingual exercises • Romanian • English • Polish • Greek • Bulgarian • German 72